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Abstract 

Crowdsourcing and human computation has been em¬ 
ployed in increasingly sophisticated projects that re¬ 
quire the solution of a heterogeneous set of tasks. We 
explore the challenge of building or hiring an effective 
team, for performing tasks required for such projects on 
an ongoing basis, from an available pool of applicants or 
workers who have bid for the tasks. The recruiter needs 
to learn workers’ skills and expertise by performing on¬ 
line tests and interviews, and would like to minimize 
the amount of budget or time spent in this process be¬ 
fore committing to hiring the team. How can one opti¬ 
mally spend budget to learn the expertise of workers as 
part of recruiting a team? How can one exploit the sim¬ 
ilarities among tasks as well as underlying social ties or 
commonalities among the workers for faster learning? 

We tackle these decision-theoretic challenges by casting 
them as an instance of online learning for best action se¬ 
lection. We present algorithms with PAC bounds on the 
required budget to hire a near-optimal team with high 
confidence. Furthermore, we consider an embedding of 
the tasks and workers in an underlying graph that may 
arise from task similarities or social ties, and that can 
provide additional side-observations for faster learning. 

We then quantify the improvement in the bounds that we 
can achieve depending on the characteristic properties 
of this graph structure. We evaluate our methodology 
on simulated problem instances as well as on real-world 
crowdsourcing data collected from the oDesk platform. 

Our methodology and results present an interesting di¬ 
rection of research to tackle the challenges faced by a 
recruiter for contract-based crowdsourcing. 

Introduction 

The success of a project or a collaborative venture depends 
critically on acquiring a team of contributors. Beyond in¬ 
creased performance and productivity, hiring a strong team 
leads to enhanced engagement and retention of workers. 

“A small team of A+ players can run circles around a 
giant team of B and Cplayers- Steve Jobs 

Crowdsourcing and outsourcing via online market¬ 
places further underscores the promise of developing pro¬ 
cedures for identifying potential contributors and compos- 
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ing teams. Crowdsourcing and human computation plat¬ 
forms highlight the opportunities for optimizing team build¬ 
ing even when a job requester and workers may be half 
a world apart and have no advance contact. To date, on¬ 
line crowdsourcing markets have largely focused on micro- 
tasking through enlisting a non-expert crowds of workers, 
who work independently and contribute to the solution of 
simple tasks such as performing image annotation and rat¬ 
ing web pages. With the increasing complexity of tasks that 
are crowdsourced, as well as enterprises outsourcing their 
work, the need to hire skilled workers with an eye to con¬ 
siderations of complementarity and coordinative efforts in 
a collaboration around problem solving is becoming im¬ 
portant. Contract-based crowdsourcing is another emerging 
paradigm where workers are recruited on a contract for per¬ 
forming tasks on an ongoing basis. The online platforms 
are offering new capabilities to deal with this shift towards 
expertise-driven crowdsourcing. For instance, oDesk pro¬ 
vides opportunities for workers to do self-assessments via 
the taking of voluntary tests ranging from those evaluating 
language skills to competencies in more complex disciplines 
such as programming. The platform provides support for re¬ 
cruiters to conduct interviews and perform online tests for 
job applicants. Furthermore, most of these marketplaces em¬ 
ploy a feedback mechanism that allows task and platform 
owners to track the skill-specific expertise and reputation of 
workers to help with future recruiting. 

Tasks and the team. We consider the crowdsourcing set¬ 
ting where the job requester has a predefined heterogeneous 
set of types of tasks that need to be solved on an ongoing 
basis. The notion of task types here could alternatively be 
taken to refer to the unique set of skills that are needed for 
addressing the needs of a project. For instance, consider an 
enterprise whose goal is to outsource a project that has three 
components or categories of tasks, each requiring a particu¬ 
lar skill: (0 web development, (ii) English to Spanish trans¬ 
lation, and (Hi) video editing. The project would have ongo¬ 
ing assignments of tasks that would belong to one of these 
three components. When a new task needs to be executed, 
it is assigned to the hired team and can be performed by 
the worker possessing the highest expertise for the skill re¬ 
quired for this task. The quality of the hired team could then 
be quantified by the highest expertise among the team mem¬ 
bers for each of the skills that are required for this project. 



Side-observations for workers 


Side-observations for tasks 


Worker-Task performance matrix 





Near-optimal teams 
(allowed error e = 0.01) 

Optimal Team: 

(Wy W 2 , W 4 } 


Teams within allowed 
error tolerance: 

^ {Wi,W 3 ,W 4 } 


Figure 1: Illustration of approach on toy example with five workers and three (types of) tasks. 


Learning workers’ expertise. In the general case, work¬ 
ers’ expertise over different types of tasks or the skills is 
unknown to the recruiter. To learn the worker’s expertise for 
a given type of task, the recruiter can perform an online test 
or evaluate the performance of the worker via assignment 
of gold-standard questions for which the ground truth is 
available. Under standard statistical assumptions, perform¬ 
ing more of these tests on a worker would give a better esti¬ 
mate about the expertise level of the worker for a given type 
of task. The recruiter’s goal is to hire a near-optimal team 
with high likelihood. The main research question is then how 
to optimally spend the budget (or minimize the total number 
of tests performed) in order to obtain a sufficiently good es¬ 
timate of the workers’ expertise over all of the required task 
types and to be able to make the hiring decision under an 
allowed level of error tolerance. 

Exploiting commonalities. Typically, the unique number 
of task types and the total number of job applicants (or the 
workers that bid for the posted tasks) could be large and 
hence may require performing large numbers of tests in or¬ 
der to learn the workers’ expertise. However, in order to 
speed up learning, one may be able to exploit the similar¬ 
ities among the tasks and underlying social ties or common¬ 
alities among the workers. For instance, consider two types 
of tasks, requiring skill “java script” and skill “ajax”. By us¬ 
ing group testing, the recruiter may design one test for skill 
“java script” that could allow to additionally infer the exper¬ 
tise on skill “ajax” at no additional cost. Prior knowledge 
about correlations among workers’ expertise and workers’ 
features (such as demographics) could also be exploited. De¬ 
pending on the specific application setting, one may be able 
to exploit the social ties among workers (or “participants”). 
The goal is to design algorithms that can exploit these dif¬ 
ferent kinds of commonalities should they be present. 

Our contributions can be summarized as follows: 

• We present an algorithmic approach to hiring a team of 
workers as faced by a recruiter for contract-based crowd¬ 
sourcing; 

• we provide algorithms with PAC bounds on the required 
budget to hire a near-optimal team with high confidence. 
Our algorithms phrase the decision-theoretic problem of 
team hiring as an instance of online learning for best ac¬ 
tion selection. 

• We propose a simple model to jointly consider the com¬ 
monalities among tasks and workers, extend our algo¬ 
rithms to exploit them and 

• evaluate the proposed methods using synthetic data as 
well as data collected from the oDesk platform. 


Related Work 


Heterogeneous crowdsourcing markets. Our work tackles 
challenges that arise in heterogeneous crowdsourcing mar¬ 
kets where a worker’s performance for a given task depends 
on the required skills and the exp e rtise level of t he worker 
for those skills. [Difallah, Demartini, and Cudre-Mauroux] 
( ]2013| ) focus on building automated tools to pick the right 
set of eligible workers for a given task based on the social 


networking profile of the workers. Goel, Nikzad, and Singla 
( |2014| ) design a mechanism for assigning tasks to workers, 
under the constraints given in terms of a bipartite graph cap¬ 
turing skills and expertise compatibility of the tasks and 
workers. Another line of research in these markets involve 
the study of coordination among workers and formation of 
teams to perform a desired task. [Shahaf and Horvitz) ( |2010| ) 
introduce the notion of generalized task markets, and how 
machines and humans can interact together to solve such 
generalized tasks by forming teams. [Zhang et al. | ( |2011 1 ) dis¬ 
cuss human computation tasks that require effective coor¬ 
dination among workers, such as itinerary planning or data 
sorting. Our work presents an algorithmic approach to the 
challenge of team hiring, with guarantees on optimality of 
the team and the budget required. 

Learning in crowdsourcing. Many problems about 
learning the performance and characteristics of the crowd 
can be cast as an instance of online learning with associ¬ 
ated explore-exploit dilemma, and hence several solutions 


use the framework of multi-armed bandits (MAB) (Lai and 


Robbins 1985 ). |Ho and Vaughan| ( |2012| ) and |Ho, Jabban^ 
and Vaughan ( 2013| ) tackle the algorithmic questions con¬ 


cerning learning worker’s expertise, task assignment and la¬ 
bel inference for heterogeneous classification tasks. How¬ 
ever, their goal is to improve the overall prediction accuracy 
at lower cost, rather different from our work. |Singla and 


Kra use] pQ13| ) and |Ho, Slivkins, and Vaughan] ( 2014] ) con- 


sider budgeted variants of MAB for learning the price curve 
and dynamically adjusting payments based on the quality. 

Best action selection. From a technical perspective, the 
most similar work to ours is the best action selection prob¬ 
lem, a more recently introduced variant of MAB problems 
(|Even-Dar, Mannor, and M ansour 2006; Bubeck, M unos, 


and Stolt z 2009 Kalyanakrishnan et al. 2012] Chen et al 


2014]). In these settings, the principal agent explores the 


problem space (the set of actions or “arms”) for a certain 
time or budget, commit s to a p olicy of the actions and then 
exploits. Even-Dar, Mannor, and Mansour (2006| ) study this 
model under the PAC (probably approximately correct) set¬ 
ting ( Valiant 1984) and introduce various (e, £)-PAC algo- 






















































































rithms for best “arm” identification, i.e., provide bounds on 
the number of samples required to output an e-optimal ac¬ 
tion with probability at least (1 — S) using concentration 
bounds ( [Hoeffding 1963| ). [Kalyanakrishnan et ak| ( |2012| ) de¬ 
sign an adaptive (e, S )-PAC algorithm Lucb-1 for selecting 
m best actions, using upper and lower confidence bounds. 
Zhou,_Chen, _and Li ( |2014| ) also study the problem of se¬ 


lecting m best actions, introducing a new aggregate metric 
and then applying it to the crowdsourcing setting by simula¬ 
tion experiments. The uniform exploration policy introduced 
by Even-Par, Mannor, and Mansour| (|2006|) and the adap- 
tive policy Lucb-1 of|Kalyanakrish nan et al.| ( |2012|) are the 
main building blocks of our proposed algorithm s. Gabillon 
et al.[|Wang, Viswanathan, and Bubeck] ( |2011| [2013] ) con- 


sider the problem of best arm identification in multiple MAB 
problem instances by jointly learning over all the problem 
instances. Our algorithms are also inspired from this idea 
of jointly identifying best actions over multiple problem in¬ 
stances and we extend the Lucb-1 algorithm to this setting. 

Exploiting commonalities and modeling side- 
observations. A recent line of research has introduced 
the notion of side-observations to exploit the additional 
information that can speed up learning. [Mannor and Shamir 


( 2011[ ) consider a class of problems that interpolate be¬ 
tween bandit feedback and full information settings. They 
consider the bandit feedback model with side-observations 
(for instance, such side-observations could arise from 
user/advertisement similarity, sensor proximit y etc.) and de¬ 
sign algorithms for adversarial setting s. |Caron et ah ( [2012 ) 


and |Buccapatnam, Eryilmaz, and Shroffj ( 2014| ) extend the 

results of the side-observation model fo r sto chastic settings. 
Side-observations through correlations ( Fang and Tao 2014 ) 
captures bandit problems where the actions are correlated, 
as well as pulling one actions invokes these correlated 
actions accounting for additional rewards and observations, 


motivated by applications in social advertisement. Cesa- 
|Bianchi, Gentile, and ZappeUaj ( |2013| ) present an algorithm 


for contextual bandits correlated through an underlying 
graph. We borrow s ome of the ideas from |Mannor and 
Sharmr| ( |2011| ) and |Buccapatnam, Eryilma z, and Sh roff 
( |2014| ) to exploit the commonalities among tasks and among 
workers. We present a simple model to jointly consider the 
commonalities among tasks and workers by representing 
it as a cross product of two side-observation graphs. Fur¬ 
thermore, for the first time, we apply these side-observation 
models to the best action selection problem. 


Problem Statement 

We now formalize the problem addressed in this paper. 

Tasks and workers. We have a set of M types of tasks 
(simply referred to as tasks henceforth) and N workers (or 
job applicants) denoted by the sets O = {cq, 02,..., om} 
and W = {wi,W 2 , ..., w/v}, respectively. We shall assume 
N > M, simply meaning that there is at least one unique 
job applicant per type of task. For instance, in Figure [T] we 
have M = 3 tasks, and N = 5 workers. We model the 
performance of a worker for a given task as a bounded ran¬ 
dom variable with unknown mean. Assigning task Oj G O 
to worker Wi G W at time t yields a performance value (as 


feedback) denoted by random variable Xy y, sampled from 
an unknown distribution with mean value For sim¬ 

plicity and w.l.o.g, we shall assume that the underlying dis¬ 
tribution from which y is sampled has a bounded sup¬ 
port within [0,1]. The mean performance values are denoted 
by an unknown performance matrix p : N x M -> 7£>o 
with tasks as columns, and workers as rows. We assume a 
stochastic setting where Xy y are i.i.d. for any fixed pair of 

worker Wi and task Oj. Also, Xy y are independent across 
i, j and t. 


Side-observation model. The workers and tasks are em¬ 
bedded in some (known) underlying graphs, denoted by 
G W (V W ,E W ) and G 0 (V 0 ,E 0 ). The nodes V w G G w cor¬ 
respond to the N workers, and nodes V Q G G 0 correspond 
to the M tasks. We shall assume undirected graphs, though 
the models and results could be extended to the setting of 
directed graphs as well. The edges in these graphs cap¬ 
ture the model of side-observations that may be possible 
to obtain at_no additional co st ([Mannor and Shamir 2011} 
Buccapatnam, Eryilmaz, and Shroff 2014). In our model, 
when worker uq is assigned task Oj at time t, apart from 
observing the performance X^ y , the following additional 
set of observations become available: 

• Xy ^ V g : {oj,o q } G E 0 , the additional observations 
associated with the tasks neighboring to Oj in G 0 . 

• X* p y V p : {wipajp} G E w , the additional observations 
associated with the workers neighboring to Wi in G w . 


In Figure [T] assigning task o\ to worker W 2 at time 
t would yield set of observations given by X 1 = 
{X[ 21) , X^, X^ 1} . X£ 3 The goal is to design al- 
gorithms that can exploit these side-observations whenever 
present, and smoothly interpolate between the bandit setting 
(absence of side-observations, E w = E 0 = 0) to the full 
information setting (fully connected graphs). 

The objective. Our goal is to select or hire a team of 
workers denoted by <S*, of size at most M from the set 
W, comprising the highest performing worker for each task 
o G O. If the performance matrix is known, the prob¬ 
lem is trivial, for instance, in Figure [T] the optimal team is 
{wi , W 2 , W 4 }. Hence, the goal is to design algorithm that can 
efficiently learn the performance matrix p[N, M] and output 
a near-optimal team. In our model, a team S is e-optimal, 
when, for each task Oj G O , we have: 


Vcq G 0, max a a ,x — max a\ < e (1) 


In Figure [I] {wi, W 4 } is an e-optimal team for e = 
0.01. Given our stochastic assumptions, the algorithm can 
repeatedly assign a task Oj to worker Wi in order to get a 
good estimate of the performance u^jy We call each such 
assignment being a test performed. We assume that each 
such test poses a unit cost to the algorithm. We seek algo¬ 
rithms with PAC bounds, i.e., for given positive constants 
(e, 5), the algorithm should output an e-optimal team with 
probability of at least (1 — S). We measure the efficiency 
of such a algorithm in terms of the total number of tests re¬ 
quired or equivalently the budget spent. 
























































Algorithms for Budgeted Hiring 
Overview of basic approach 

To present some of the key insights in designing our algo¬ 
rithms, we first consider a simple setting. 

Single task (M = 1) without side-observations. Let us 
first consider the simple setting of hiring to solve one task, 
i.e., M = 1 and the goal is to find an e-optimal worker 
from set W with success probability of at least (1 — S). 
We consider the recruiting of team members from among 
N workers as the set of actions at hand, and reduce the 
decision problem to the problem of best action selection 
([Even-Par, M annor, and Mansour 2006; Bubec k, Munos, 


and Stoltz 2009[ |Kalyanakrishnan et al. 2012[ |Chen et al' 


2014 ]). For example, the Naive (e, 5) algorithm of Even-Par, 


Mannor, and Mansour ( 2006| provides (e, S)-FAC guaran¬ 


tees by" uniformly allocating a sufficient number of observa¬ 
tions for each action to be able to select e-optimal action with 
probability at least (1 — S). By using Hoeffding’s inequal¬ 
ity ( [Hoeffding 1963| , a sufficient number of observations is 
|~Jr ln(^)~|. This NAIVE(e, 5) algorithm is the main build¬ 
ing block for our proposed algorithm UExpSELECT based 
on uniform exploration of the actions. 

This algorithm is based on uniform exploration and ig¬ 
nores the fact that some actions may be easier or harder 
to distinguish. For example, in Figure [I] considering task 
oi, distinguishing from w\ is easier than distinguishing 
W 2 fro m wi . To tackle this problem, [Kalyanakrishnan et al.| 
(2012) design an adaptive (e, S)-PAC algorithm Lucb-1 us¬ 
ing upper and lower confidence bounds. Lucb-1 adapts to 
the complexity of the problem instance, provides distribu¬ 
tion dependent bounds and is the state-of-the-art algorithm 
for the best action selection problem. We use Lucb-1 as the 
main building block for our proposed algorithm AExpSe¬ 
lect, an adaptive variant of UExpSelect. 

Multiple tasks (M >1) without side-observations. One 
possible way to tackle this challenge is to consider each task 
as a separate instance of the best action selection problem, 
and to use one of the previously discussed algorithms Naive 
or Lucb-1 separately. However, one can hope to do better by 
jointly considering all of the tasks, and allocating the budget 
across tasks in an adaptive manner. For instance, in Figure [I] 
the task is harder than task t\ and task t% in terms of distin¬ 
guishing and selecting the best worker. Recently, Gabillon et| 
ak] ( l201l| and |Wang, Viswanathan, and Bubeck[ ( 2013| have 


addressed this problem of best arm identification in multi¬ 
ple multi-armed bandit (MAB) instances by jointly learn¬ 
ing over all of the instances. Our proposed algorithms are 
inspired from the idea of jointly identifying best workers 
(the team) for all of the tasks and AExpSelect extend the 
Lucb-1 algorithm to this setting. 

Exploiting side-observation graphs 

Side-observation model s ([Mannor and Shamir 2011 [[Caron] 
et al. 20 12[|Buccapatnam, Eryilmaz, and Shroff 2014 ) have 


been studied mainly in context of regret minimization prob¬ 
lems using MAB framework, modeling the observations via 
an underlying graph connecting the “arms” of the MAB. Al¬ 
though different ideas have been explored on how to ex¬ 


ploit side-observations via an underlying graph, all these 
ideas revolve around the minimal dominating set of the side- 
observation graph G(V, E ) denoted by DOM(G). This con¬ 
cept refers to the smallest subset of vertices that cover the 
rest — every vertex of the graph G is either in DOM(G) or is 
directly connected to one of the vertices in DOM(G). 

We extend these ideas to apply the side-observation mod¬ 
els for the best action selection problem. Since our pro¬ 
posed algorithms jointly learn over these M tasks, we would 
like to jointly exploit the side-observation graphs over the 
tasks and the workers. We can model the side-observation 
graphs jointly as the cartesian product of two graphs given 
by G W \DG 0 , denoted as G wo = (V wo ,E wo ). In a carte¬ 
sian product of graphs, the vertices are given by the carte¬ 
sian product of the vertex sets of the individual graphs, 
V wo = V w x V 0 , or alternatively, V wo = {( w^Oj ) : i G 
[1... N] and j G [1... M]}, i.e., G wo has M • N ver¬ 
tices. The edges are given by E wo such that {(wi,Oj)} 
and have an edge if either i) Wi = w# and 

(< Oj,Oj /) G E 0 , or ii) Oj = Oj> and ( w^wy ) G E w . Let 
7 {G wo ) denote the minimum size of a dominating set in 
the resulting graph. Computing the dominating set itself is 
NP-Hard by a red uction from the set-cover problem ( Guha 
and Khuller 1998). However, an approximate solution can be 
found of size upper bounded by (l + ln(l + DEG(G wo ))), 
where DEG denotes the maximum degree of any vertex in the 
graph |Guha and Khuller 1998| . Let us denote this approx¬ 
imate dominating set as DOM(G wo ) and the corresponding 
approximate dominating number as 7 g wo • 

We denote the set of actions as A = {auj) • i G 
[1... TV] and j G [1... M]}. Taking action at time 
t is equivalent to assigning a worker wi to a task Oj at 
time t. For any action a^jy with a slight abuse of no¬ 
tation, we denote its neighboring action belonging to the 
dominating set as DOM (G wo ,a(ij), m ) G DOM(G wo ). We 
call this the dominating action for a^jy For any action 
a(ij) G DOM(G wo ), we denote the set of actions dominated 
by as DOM(G wo , •, C V wo . The main idea used 

in our algorithms UExpSelect and AExpSelect is to re¬ 
place the picked action a* p q ^ by its dominating action a* - ~y . 

Model parameters and execution variables 

We now introduce several model parameters as well as no¬ 
tation that will be useful to describe the algorithms. 

e-optimal team. For any task oj , the highest performance 
among all the workers is given by fJ>(i*jy = max w . e yy; 
and let w^*jy be the worker with this highest performance. 
We denote the best worker for this task with the corre¬ 
sponding action CL(i*jy G A. For a given task Oj, we 
can now denote the relative quality of a given worker wi 
w.r.t the performance of the best worker available for this 
task as F° r the specific case of 

best worker w^*jy this quantity is defined as: A^*j) = 
~ m & x w i ew\{w( i * J )} d'iij) denoting the gap with 
the second best worker for this task. For any task Oj, we 
say a worker Wi G W \ {w(i*,j)} is e-optimal for Oj if 
A< e. We denote this set of e-optimal workers along 
with best worker as S( e jy Now, a solution out- 

















































Algorithm 1: Algorithm UExpSelect 

1 Input: Tasks: <D\ Workers: W; Side observation graphs: 
G w , G 0 ; PAC parameters: (e, (5); 

2 Output: Team of workers S CW : (S'! < M, such that 
S is e-optimal with probability at least (1 — 5); 

3 Initialize: 

• Compute: G wo = G W \3G 0 ; DOM(G wo ); 

• t = 0; S = 0; 

• Va «) G A: tfij) = 0; n h) = 0; rfij) = 0; 

while 3 a (iJ) G A : y* (iJ) < [^ ln(^)] do 


a M = 


> Greedy action 

a \p,q) = a \p,q )’')’ 

Perform action: a* _ ; > Assign to 

Feedback: Obtain observations X 1 for actions 
dominated by a^-y. DOM (G wo , ■, a^- } ) ; 

Update Variables: 

l; 


Algorithm 2: Algorithm AExpSelect 

1 Input: Tasks: (9; Workers: W; Side observation graphs: 
G w , G 0 ; PAC parameters: (e, 5); 

2 Output: Team of workers S CW : (S'! < M, such that 
5 is e-optimal with probability at least (1 — 5); 

3 Initialize: 

• Compute: G wo = G W \3G 0 ; DOM(G wo ); 

• t = 0; S'* = 0; = 0; 

• Va (ii) G 4i) = 0; n h) = 0; y h) = 0; 

• V a( t .jj G A: 0(ij) —t’ y°j £ A*- —> oo; 

while R l A 0 do 

o* = argmax 0 . eRl A*-; 

5 4*. 9 ) = argmax lUieW ^ i<;) ; 


n t+1 — n l 
n (P,q) ~ U (P,Q ) 


10 


Va (i)i) G DOM(G wo , 2 / ( +.) = y* (iJ) + 1; 

Va ( jj) G D0M(G u)o , •, °(p,q))) update from X* 

t = t 1; 

foreach j G [1... M] do 

4*,j) = argm^.ew^gj); 

5 ^ Su H*d)}; 


7 

8 
9 

10 

11 

12 


ii Output: S 


put S from the algorithm is e-optimal (denoted as S € ) if it 
contains at least one e-optimal worker for each task, i.e., 
Vj G [1... M] : \S D 5( e j))| > 1. Putting e = 0 in S e 
will correspond to the optimal team. 

Variables over execution. The algorithm will run in time 
steps, denoted by t , where each time step corresponds to 
the assignment of a task to a worker. Hence, the total num¬ 
ber of time steps until execution of the algorithm corre¬ 
sponds to the budget spent or sample complexity for the 
algorithm. At time step t , let n* . y correspond to the num¬ 
ber of times task Oj has been assigned to (or simply, ac¬ 
tion has been performed). Also, let y correspond 
to the number of times total observations have been made 
about performance of Wi for task Oj (note that, in the ab¬ 
sence of side-observations, y = The current 

estimate of the mean values are denoted by /x^ y. With 
these estimates, we also define /x*.* y = max Wie yy /x^ 
and y = arg max w . GVV /x^ y. Similarly, we define the 
quantities A^ y based on current estimate of the perfor¬ 
mance values /x^ y. 

Algorithm UExpSelect 

We now present our first algorithm UExpSelect, shown in 
Algorithm [TJ based on the uniform exploration of all the ac¬ 
tions extending ideas of Naive algorithm (Eve n-Par, Man- 
nor, and Mansour 2006). At each iteration, the algorithm se- 


13 


arg 

wl = argma x w . e{w t 


a 


(p,q) 


a C) 


£“ (. W pi°q)’ 

DOM (G. 


w°i a lp,q) 


(4,9) + > 

> Greedy action 

); 




2 (p,q)^ 9 


Perform action: a* _ y ; > Assign Oq to Wp 

Feedback: Obtain observations X 1 for actions 
dominated by a* _ y : DOM(G t( 

Update Variables: 

• n \m = n kt) + 1; 

• V) £ D0M(G„ o ,v 

• Va ( jj) G DOM(G wo , •,«!- -)), update from 

• t = t + 1; 

• \/oj G R 1 , update AG 

Update Solution: 
foreach o* G R 1 do 


’°(p,«)>W(S = 4i) + 1; 

,t+1 


14 

15 

16 

17 Output: S t 


if A j < e then 

w (i*j) = arg max K . e w M(ij ); 


S t = S t U{wt.. J) }; 
R* = R t \ {oj}; 


lects the action a^ p y with minimal number of observations 


y* p y (Stepjij). This choice is natural and can be thought of 
as “greedy” in order to quickly move towards termination of 
the algorithm. Given the side-observation model, algorithm 
takes the action y (Step |5j), i.e., the one that dominates 

a \p q) 9 as takin § a \p q) a ^ so gi yes us tke desired observation 
needed for a^ p y. Then, it receives the observation set X 1 , 
corresponding to all the actions that are dominated by a* - y , 
and updates the corresponding variables. Once every action 
has made observations of at least |~^ ln(^^)], the algo¬ 
rithm selects the best set S based on the observed perfor¬ 
mances /x^ y. Note that, if we ignore the side-observation 

model, then y = a^ p y, and the observations set corre¬ 
spond to singleton set, given by X 1 = {X* p 



















Algorithm AExpSelect 


In order to adapt the algorithm to the variability of the 
hardness of the problem in identifying suboptimal workers 
across tasks and within one given task, we present a sec¬ 
ond algorithm AExpSelect, based on ideas of Lucb- 1 al¬ 
gorithm ( [Kalyanakrishnan et al. 2012 ). In order to present 
AExpSelect, we introduce some specific terminology as 
well as the approach used to pick the actions. 

First, we associate confidence bounds, i.e., a high prob¬ 
ability bound over the estimates of the performance y. 

This is denoted by the function /3(yyyR). The specific 
form of function we use, as used in Lucb- 1, is given by 


/3(y,t) = y^lii (| • • Rj. One of the key intu- 

itions behind this specific function is that we seek to ensure 
that the probability of the event that the confidence interval 
bounds are ever violated over the lifespan of the algorithm is 
bounded by S. For a given action ayjy the upper and lower 
confidences over the performance estimate y are given 

as (Aj) + and - Pivtij)’*)) res P ec - 

tively. 

At a given time t and for a given task Oj , we denote the 
worker with highest empirically observed performance as 
w \ l%j y given by: 


= ar S max /A') ( 2 ) 

Wi£W 


Next, from the remaining N — 1 workers, we find the worker 
with maximum value of upper confidence of performance 
estimate as follows: 


= argmax + /^(vO’ *)) (3) 

The empirical mean of y is denoted by y, and 
has lower confidence bound of (/i^* y — P(yy* y, t)). For 
Wy 9 y , the empirical mean is denoted by py i% y , and has up¬ 
per confidence bound of (/i^. y + /3(yy. -y £)). The quan¬ 
tity that is of particular interest is the gap between upper 
confidence bound on py { . y and lower confidence bound on 

Pyi* y . Intuitively, as we get increasing numbers of observa¬ 
tions and confidence widths shrink, this gap should reduce 
to below zero. We denote this quantify for task Oj as follows: 

A i = +P(v\i\jy t )) - 

(4) 


Based on the ideas from Fucb-1, the algorithm can com¬ 
mit to worker Wy* y for task Oj whenever A*- < e, and this 
is e-optimal choice, as long as the confidence intervals are 
not violated. Intuitively, we are taking the worst-case esti¬ 
mate of y and highest of the best-case estimate from 
the remaining workers — ensuring this difference being less 
than e is sufficient to commit to worker w[ i# y . 

AExpSelect is shown in Algorithm [2] At each itera¬ 
tion, the algorithm first selects the task with highest A*- de¬ 
noted by index o l q (Step[4|. Then, it finds the corresponding 


workers y and w* im y (Step |5j jbj). Then, the greedy 
choice of action ay p y is based on choosing the worker with 
higher confidence width among y and w^ 9 y (Step 0 

[sj). Note that, the solution set S* is built over time. The al- 
gorithm maintains a set of tasks R l as the tasks for which a 
worker still needs to be selected. As soon as the condition in 
Equation[4]is met for a task, that task is no longer considered 
for further actions and removed from R f . The algorithm ter¬ 
minates when R 1 is empty. Note that there is a common time 
clock across all the tasks. Jointly learning over all the tasks 
ensures that the algorithm can allocate more assignments for 
the tasks which have maximum uncertainty. Furthermore, it 
allows us to jointly exploit the side-observation graphs. 

Performance Analysis 

We now analyze the performance of the proposed algo¬ 
rithms UExpSelect and AExpSelect. Most of the re¬ 
sults below can be derived using the p roof techniques of 
Naiv e (|Even-Dar, Mannor, and Mansour 2006] ) and Lucb- 
1 ( [Kalyanakrishnan et al. 2012} , and can be seen as exten¬ 
sion of their results. 


Performance Bounds for UExpSelect 


Let us consider the case of the absence of side-observations, 
which is equivalent to setting E w = 0 and E 0 = 0. In 
this case, a^y{t) = a\ p q y observations set correspond 


to singleton set of X 1 = {X* p y}, and DOM(G wo ) and 
DOM(G wo ) are both equal to V wo . In fact, in terms of per¬ 
formance bounds, the algorithm UExpSelect can be seen 
as equivalent to running M instances of Naive-( e, ^). 
Based on Theorem 6 from [Even-Par, Mannor, and Man-] 


sour 


(2006 ), the sample complexity of NAlVE-(e, 5) for 
one instance of the problem with N actions is given by 

(yN • ^2 ln(y) Hence, the sample complexity of UEx¬ 
pSelect in the absence of side-observations is given by 
(yM • N • |~^ ln(^^) ^. The PAC-(e, <S) guarantees hold 


simply from the correctness of NAlVE-(e, (5). The fact that 
we ran M instances of Naive with jy ensures that the error 
probability is bounded by S using the union bound. Next, 
we can state the improvement in performance obtained by 
accounting for side-observations in Theorem [T] 


Theorem 1. The algorithm UExpSelect is (e, S)-PAC op¬ 
timal with sample complexity of (yG wo ' |~^ l n (^^-) 
where G wo = G W UG 0 and y Gwo < (l + ln(l + 


DEG(G wo ))) • iGywo' 

Recall that DOM(G wo ) denotes the polynomial-time ap 
proximation of the dominating set for G,„ n jmd has size 
bounded by (l + ln(l + DEG(G wo ))) (Guha and Khuller 

1 


1998]). By taking each action of DOM(G wo ) once, the entire 
actions is covered. Hence by taking 7 g wo actions, we 


set 


get observations of the M • N actions resulting in a poten¬ 


tial saving of tests by factor of . Importantly, the greedy 
way of selecting the actions in Step[4]of Algorithm[I]ensures 
that all of the actions in DOM(G wo ) are scanned uniformly. 






























Performance Bounds for AExpSelect 

Let us define A= max{A^j), a}, for any value of 
a>o. In particular, we are interested in quantities A^ j e / 2 ). 
Let us again begin by considering the case of the absence of 
side-observations. One way to tackle this problem is then to 
run M instances of LuCB-l-(e, jt) algorithm, each with its 


own time clock. Based on Theorem 6 from Kalyanakrishnan 
et al.| ( [20T2] ), the expected sample complexity of Lucb- 1- 
(e, 8 ) for one instance of the problem with N actions for a 
particular task Oj is given by: 


O 


((E sr-) h <r£ 


1 


A 2 

ie[N] W.f) 


8 ^ A 2 e \ 

ie[N] 


(5) 


dominating set) that we used, does not help boost the per¬ 
formance for an adaptive algorithm. Intuitively, and as we 
observed during empirical evaluations, for the problem in¬ 
stances that are uniformly difficult, we tend to gain more 
value from side-observations. However in such cases, AEx¬ 
pSelect tend to behave more closely as UExpSelect. 
For more skewed tasks and workers in terms of difficultly 
and performance, the “easier” to identify workers and tasks 
gets “eliminated” over time, and hence the value of side- 
observations diminish as well. Hence, for adaptive algo¬ 
rithms like AExpSelect, a more effective way of exploit¬ 
ing side-observations would need policies that construct dy¬ 
namic dominating sets at every time step taking into account 
the remaining uncertainties over the actions. 


The expected sample complexity of running M instances 
of LuCB-l-(e, jj) is then given by: 


Experimental Evaluation 

We now report on the results of our experiments. 


°( £ ((£ 
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( 6 ) 


However, by jointly learning across all the tasks, an 
algorithm can adaptively allocate assignments across the 
tasks. AExpSelect is based on this idea, originally pro¬ 
posed in Gabillon et al.[ [Wang, Vis wanathan, and Bubeck 
( 2011} [2013| ) and it extends Lucb-1 algorithm to this joint 
setting. Intuitively, the main reason this is possible in the 
best-action selection problems is because the problem com¬ 
plexity is defined in terms of relative “gap” A^j) which can 
be mixed together for all the tasks o 3 to create one pool of 
M • N actions defined by their correspond gaps A ^ j) . Then, 
by using a common time clock over these M • N actions, the 
main technical results of Lucb-1 extends to this joint set¬ 
ting ( [Wang, Viswanathan, and Bubeck 2013) . The sample 
complexity of AExpSelect is given in Theorem [2| which 
is based on Theorem 6 from ( [Kalyanakrishnan et al. 2012| . 

Theorem 2. The algorithm AExpSelect is (e, 8 )-PAC op¬ 
timal with expected sample complexity given by 


O 


(eest-K-ei:^)) 

je[M]ie[N] W»f) (i.n.Z) / 


A 2 . . 

j£[M]i£[N] (lL§) 


Note that the above sample complexity bound is similar 
in structure as given in Equation [5] with total of M • TV. 
However, this is different compared to one obtained in Equa- 
tion[6]by running M instances of LuCB-l-(e, with sep¬ 
arate time clock for each task. In fact, when all the tasks are 
of equal hardness defined by quantity JA 


e[N] a?. . e v 


for a 


given Oj , the sample complexity in Theorem [2] and Equa¬ 
tion [6] is same. 

This bound in Theorem [2] is loose in the sense that it 
doesn’t explicitly account for the performance gain achieved 
by the side-observations, even though AExpSelect uses 
the same approach as that used in UExpSelect to ex¬ 
ploit side-observation graphs. In the worst-case, the static 
model of side-observations (i.e., a pre-computed and fixed 


Experimental Setup and Datasets 

We compare the performance of adaptive algorithm AEx¬ 
pSelect against the uniform exploration based algorithm 
of UExpSelect. Furthermore, we quantify the effect of 
side-observations by comparing these two algorithms with 
their variants without side-observation graphs (setting E 0 = 
0, E w = 0 as input). 

Metrics and parameters. The primary metric is the qual¬ 
ity of the team output by the algorithm for given budget, 
measured through i) average precision, and ii) average per¬ 
formance gap, as defined next. For a given output S, and any 
task Oj , the precision for task Oj is defined to be 1 if S' con¬ 
tains an e-optimal worker for task Oj, i.e., |S D I — 1’ 
else 0. The performance gap for a task o 3 is defined to be 
(p(i* : j) — mdix w . e s T(i,j) )• We report the average precision 
and average performance gap over all the M tasks for the 
team output by the algorithm for a given budget. 

The primary quantity that we vary in the experiments is 
the total number of tests performed or budget spent by the 
algorithm. For ease of interpretation, we shall use the unit 
of the average budget spent per worker/task pair. Also, we 
shall report results by varying the hardness of the problem 
instance (Figure 2(c)). For a given task Oj, we used the 
notion of hardness given by A™ m = min^! jv] A 
We vary average value of the gap AA m over tasks (i.e., 
‘ i m] Af m ) by creating different datasets and 
measuring the performance of different algorithms for a 
fixed budget. 

The PAC parameters e and S are fixed for all of the re¬ 
ported experiments and set to 0.05. The number of tasks is 
M = 10 and total number of workers is N = 200. In all 
of the experiments with varying budget, the average AJ* m 
over tasks is fixed to 0.25, with A^ m for a task Oj uniformly 
sampl ed in t he range from [0.01, 0.5]. For the experiment in 
Figure 2(c) where the average A^ m is varied, the average 
budget per worker/task pair is fixed to 20, i.e., equivalent to 
total budget of M • N • 20. The values of the performance 
matrix p are scaled to lie in the range /i ma:E ] where 

































Precision with increasing budget spent 

Synthetic data 
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Average budget per worker/task pair 


Performance gap with increasing budget spent 

Synthetic data 



Precision with decreasing average gap 

Synthetic data 



Problem instance becomes difficult 


V™ n in data 


(a) Precision with increasing budget (b) Performance gap with increasing budget (c) Decreasing average gap A" 

Figure 2: Experimental results on synthetic data with absence of side-observation graphs. In Figure 2(a)|2(b) the budget is 
varied, and the metrics of average precision and average performance gap are measured, respectively. In Figure 2(c)| budget is 
kept fixed to 20 per worker/task pair, and average A™ m is changed from 0.25 to 0.05, making the problem instance difficult. 


Precision with increasing budget spent 

oDesk data 
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Precision with increasing budget spent 

oDesk data with task-graph 
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Precision with increasing budget spent 

oDesk data with task-worker-graph 



Average budget per worker/task pair 


(a) Without side-observations 


(b) Side-observations over tasks 


(c) Side-observations over tasks & workers 


Figure 3: E xperimental results on oDesk data. In all the plots, budget is varied and m etric o f avera ge pr ecision is measu red. In 
Figure 3(a) there is no side-observation graph and is equivalent to the plot in Figure 2(a)| Figure 3(b)| and Figure 3(c) shows 
the comparison of AExpSelect and UExpSelect with their variants without side-observations (E 0 = 0, E w = 0 as input). 


/x mm = 0.1 and /x maa; = 0.9. We assume a Bernoulli feed¬ 
back model, z.e., for assigning task Oj to worker wi 

yields a feedback value of 1 with probability H(ij) and 0 
otherwise. All the results are reported as an average of 10 
iterations of the algorithms. 


Synthetic data. We created synthetic data for N = 200 
workers and M — 10 tasks as follows. For each task Oj , we 


1 J lin ] and one 
max . These N values 


sampled AJ^ m uniformly at random from range [0.01, 0.5] 
(to have average A™ m = 0.25). Then, to create the per¬ 
formance vector /Ji(.j) for N workers (corresponding to a 
column in the performance matrix in Figure [TJ, we sampled 
(N — 1) values in the range [/x mm , fjE 1 ^ 
value (of the best worker) is set to /x 
are then randomly permuted and assigned to the N workers 
for task Oj. This process is repeated for each of the M tasks 
independently. For the synthetic experiments, we didn’t use 
side-observations, equivalent to having E w = 0 and E 0 = 0. 
For the experiment reported in Figure |2(c)| we created 4 
more variants of the synthetic data by varying average A™ m 
as [0.25,0.20,0.15,0.10,0.05]. 


oDesk data. The primary purpose of using data from 
oDesk is to be able to obtain real-world distributions of the 
performance matrix, as well as a realistic way of creating the 
side-observation graphs. oDesk has over 2.7 million free¬ 
lancers and 0.5 million job requesters worldwide. We used 
the publically available APQfrom oDesk to obtain the data 

1 https://developers.odesk.com/ 


below. In oDesk platform, each posted task or job is assigned 
to a predefined taxonomy by the job requester. There are 12 
top-level categories of the tasks and about 90 second-level 
categories. We took M — 10 tasks, with 4 tasks in the top- 
level category Design & Creative , 3 tasks in the top-level 
category Translation and 3 tasks in Data Science & Analyt¬ 
ics. We note that this choice is arbitrary, and does not effect 
the reported results qualitatively. We also performed experi¬ 
ments on other variants of the oDesk datasets that considered 
different sets of task types. 

Each worker in the oDesk has a profile with rich meta¬ 
data available via an API. In particular, the fields that are 
of particular interest to us include: i) the “skills” (a set of 
free-form text tags that workers can assign to themselves); 
ii) feedback score based on previous tasks completed; iii) 
number of hours worked; and iv) the top-level categories 
of the tasks completed by the workers, based on which the 
feedback score is aggregated. We crawled a sample of 200 
workers by issuing a specific query]^] The skills in this query 
were chosen so as to ensure that the completed jobs by the 
workers in the retrieved list possibly have some overlap with 
the top-level categories of the M tasks, otherwise, this over¬ 
lap would be low for a randomly retrieved list of workers. 
In realistic setting, this overlap is expected as workers bid 
for tasks based on their skills and job profile. The number of 


2 { 'hours' : '[100 TO 10000]', 'skills': 

'cartooning OR machine-learning OR 
translation'} 















































































hours worked was set to a minimum of 100 to ensure there is 
sufficient feedback available for the workers, given that the 
feedbacks are generally sparse. 

We created the side-observation graphs as follows. We 
add an edge between two tasks o x and o z , i.e., {o x ,o z } E 
E 0 , if these two tasks belong to the same top-level cat¬ 
egory. In our setting, this would result in 3 disconnected 
cliques among the 10 tasks. For the workers, we computed 
the Jaccard’s coefficient between the skills of any two work¬ 
ers. We add an edge between two workers w x and w z , i.e., 
{w x ,w z } e E w , if the Jaccard’s coefficient between w x and 
w z is above a certain threshold (chosen to be 0.3 for the 
reported results). Next, we create the performance matrix 
from the feedback scores in a similar manner to the approach 
we took with the synthetic data. First, for each task Oj, we 
sampled A™ m uniformly at random from range [0.01, 0.5]. 
Then, for a given worker Wi and task Oj , we look at the feed¬ 
back score of Wi obtained in the historically completed tasks 
which belong to top-level category same as that of Oj. Note 
that this feedback score is in the range of [0, 5] rating. When 
available, this feedback score is used for , else feedback 
score is randomly sampled from [0, 3]. These feedbacks are 
then scaled to lie in the range [p min , /i ma:E — A™ 171 ], except 
for best worker for oj, whose /i( ij\ is set to This pro¬ 

cess is repeated for each of the M tasks independently. 


Results 


We now discuss the findings from our experiments. 

Varying budg et and measuring precision. Figure |2(a) 
and Figure |3(a)| shows the results for varying the average 
budget spent per worker/task pair, and how it leads to in¬ 
creased precision of the team selected by UExpSelect and 
AExpSelect. For these results, the average A^ m = 0.25, 
and is same for bot h the s ynthetic data (Figure [2(a)] ) and the 
oDesk data (Figure [3(a)] ). For both the datasets, AExpSe¬ 
lect shows significantly faster convergence towards select¬ 
ing the optimal team. For instance, in Figure [2(a)] AExpS¬ 
elect achieved over 90% precision (getting the e-optinal 
worker for 9 out of 10 tasks) at budget of 20 • M • N, whereas 
UExpSELECT requires substantially much more budget to 
achieve same precision. The difference in performance of 
AExpSelect or UExpSelect across synthetic and oDesk 
datasets is simply attributed to the different distribution of 
the workers’ performances across the datasets. In particu¬ 
lar, in the oDesk data, the performance values of the work¬ 
ers are more skewed towards higher values making it more 
challenging problem instance, in comparison to the synthetic 
data where the performance values are sampled uniformly. 

Varyin g budget and measuring performance gap. Fig¬ 
ure | 2 (b)| sho ws an alternate vie w of the corresponding re¬ 
sult inFigure [2(a)] While Figure [2(a)] reported 0/1 loss, Fig¬ 
ure [ 2 (b)]reports the average of the actual performance gap of 
the best worker for a task in the output set compared to the 
best worker in full set. 

Varying hardness of problem instance. In Figure |2(c) 


budget is kept fixed to 20 per worker/task pair, and average 
A™ m , quantifying the hardness of the problem instance, is 
changed from 0.25 to 0.05. The gain of adaptive assignments 


in AExpSelect compared to UExpSelect is consistent, 
though both the algorithms degrade in performance as ex¬ 
pected. _ 

Effect of exploiting side-observations. In Figure |3(a)| 
there is no side-observation graph, ( E 0 = 0, E w = 0), 
and the 7 g wo is simply equal to M • TV. In Figure |3(b)| 
there is side-observation graph over the tasks as described 
in the data generation, however no graph is used over work¬ 
ers ( E w = 0). The 7 g wo is this case as computed by greedy 
algorithm is equal to 600. Figure [3(c)] shows results which 
considers side-observation graphs over both tasks and work¬ 
ers, with 7 g wo = 386. Both the algorithms see a signifi¬ 
cant boost in terms of faster learning by exploiting the side- 
observations. Furthermore, we can see that the boost in per¬ 
formance by adding side-observations is more for UExpS¬ 
elect compared to AExpSelect, as discussed during the 
theoretical performance analysis of the algorithms. 


Conclusions and Future Work 

We presented an algorithmic approach to tackle the chal¬ 
lenge of the efficient hiring of teams of workers, as faced 
by recruiters for contract-based crowdsourcing. By casting 
these budgeted decision-theoretic problems as an instance 
of online learning for best action selection, we designed al¬ 
gorithms with PAC bounds, and further extended them to 
exploit the commonalities among the tasks and the workers. 
Our methodology and results present an interesting direc¬ 
tion of continued research for the problem of hiring a team 
for contract-based crowdsourcing. 

We see several interesting directions in which the current 
work can be extended. In particular, we used a simple notion 
of quantifying the optimality of the team. We see promise in 
extending the results to incorporate more complex relations 
among team members, such as the matching of task types 
within teams to balance the workload, capturing diminishing 
returns of growing teams, learning and representing costs as¬ 
sociated with communication and coordination among peo¬ 
ple with different skills and abilities (including collabora¬ 
tive competency), and other combinatorial constraints, as 
an interesting direction for future work. Furthermore, we 
are interested in developing more realistic models of side- 
observations and performing real-world experiments using 
those models. 
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